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Abstract: Classification is used to construct a 
representation for categorizing a set of pages. It is the 
process of mapping a page into various predetermined 
classes. In the Web domain, classification approaches 
are enabled to develop a profile of users related to a 
specific group and use particular server files. This in 
turn necessitated mining and collecting features 
depending on demographic data accessible on users or 
access patterns. The existing techniques such as 
Dominance Fuzzy Clustering and Distributed Probability 
Graph (, Map Reduce Pearson Correlation Fisher's 
Linear Discriminant Classifier (MPC-FLDC) technique 
and Poisson Fragment Frequency based Web Pattern 
Clustering technique are implemented in Java language 
by using Apache log samples dataset. Through the use 
of Apache log samples dataset in the experimental 
evaluation, web traffic patterns are effectively mined 
with the goal of tracking the location of web user. The 
proposed techniques are compared with existing 
methods. Keywords: Web Data, Clustering, Classifier, 
Space complexity. 


I. INTRODUCTION 

Web traffic is generally started through the use of 
web browsers. Traffic flow begins with a mouse click 
for delivering browser information to a server 
utilizing programmed rules and techniques to 
acquire user browser requests. Depending on these 
rules, the server chooses the type of action required. 
Web Traffic analysis is performed for maintaining 
and classifying the traffic. It also enhances the 
workload managing ability of the web server. 

User communities are created by data collected 
from Web proxies while users browse the Web. 
Many hybrid representations are designed overtime 
as search engines integrated directory features to 
address the problems like categorization and site 


quality. The key objective is to recognize the 
behavioral patterns in collected usage data and 
implement community Web directories depending 
on patterns. The method of collecting the patterns 
from data to web directories are called Usage Data 
Preparation. 

Web traffic investigation applications are linked with 
large amount of data. Web traffic analysis is 
employed for extracting information and evaluating 
the performance for efficient input preprocessing. The 
data in the server logs are irregular, unrelated, noisy 
and unnecessary for an application of interest. 
Preprocessing of input data and feature collection 
approaches are utilized for choosing suitable 
attributes. The data networking world helps 
organizations to perform business by assisting 
companies to converse better with employees, 
customers and distributors. 

II. RELATED WORKS 


Marios Belk et al. [1] described user 
modeling mechanism for designing users 
cognitive styles depended on navigation patterns 
and click stream data. The gathering of users 
through measures acquired from psychometric 
tests and content navigation behavior with the aid 
of clustering methods are examined. Also, 
navigation metrics are employed in identifying 
specialized user groups with same navigation 
patterns associated to cognitive style. A 
psychometric- depended assessment is 
performed in mining the users cognitive styles. 
True positive rate is not calculated it decreases 
user modeling mechanism. 
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Mohammad Amin Omidvar et al. [2] considered 
the effectiveness of distinct variables on diverse 
dependent variables. These variables are times 
series and a time series regression is discussed. 
The time series regression is a significant and 
primary index on Google analytic. In addition, the 
most appropriate data provided to acquire 
outcomes of true positive rate has not been 
addressed. 

Neha Goel et al. [3] analyzed Web Log Expert for 
discovering the user behavior accessing an 
astrology website. A comparison of accessible 
log analyzer tools is performed. Web Log 
Analyzer tools are sector of Web Analytics 
Software which accepts log file as input and 
examined input for producing outcomes. Web 
Log Expert considered web logs of the website 
and results are analyzed for inclusion in user 
website. It, in turn sustained in recognizing the 
customer behavior. The process of user behavior 
discovery is not accurate in terms of results. 

II. METHODOLOGY 

In order to overcome the limitations in the 
existing methodology proposed a method for 
performing effective web mining. 

3.1 Dominance Fuzzy Clustering and 
Distributed Probability Graph 
Framework 

Dominance Fuzzy Clustering and Distributed 
Probability Graph (DFC-DPG) framework is 
developed with the aim of extracting the similar 
web pages which is visited by user with 
improved clustering efficiency, less latency and 
space complexity. The implementation of 
proposed DFC-DPG framework contains four 
phases such as web user data collection, 
dominance rank model, fuzzy clustering 
approach and Distributed Probability Graph Arc 
model. The DFC-DPG framework is used in 
order to discover the information about the 
activities of web user from weblog data base. 
Initially, the web user data are collected by using 
server log files. The development of Dominance 
Rank model in proposed DFC- DPG framework 
separates the relevant and irrelevant web user 
data. Following this, fuzzy clustering is carried 
out on relevant data to form the cluster which 
contains the users with similar access 


sequence. Finally, Distributed Probability Graph 
Arc (DPG) model examines the access history 
of web user for predicting the future access of 
web user. Due to this model, the cache 
utilization and latency are minimized in a 
significant manner. The description about the 
process of proposed DFC-DPG framework is 
discussed below. 

3.2 Web User Information Collection 

During the implementation of proposed 
DFC-DPG framework, web user information 
collection is carried out as first process. The 
proposed DFC-DPG framework collects the 
information of web user through the sever log files 
from the web server data base. The web server 
log file is considered as text file which contains 
one line for each web user queries. Every line in 
log file includes information such as host making 
the request, timestamp, requested URL, HTTP 
reply code and bytes in reply which is visited by 
the user. The other log file is considered as Parse 
Log which is obtained from web server log file. 
The obtained files contained IP address, 
hostname, date, time and request. These data is 
stored on a web database for successfully 
handling the data in an effective manner. 

3.3 Linear-Temporal Logic Model Checking 
Approach 

Sergio Hernandez [16] developed linear- 
temporal logic (LTL) model checking approach for 
investigating structured e-commerce web logs. 
Based on the mapping log records with e- 
commerce structure and web logs were changed 
into event logs to extract the user behavior. 
Various predefined queries were developed to 
recognize behavioral patterns of a user during 
sessions. Certain enhancements in the website 
design were made to improve its performance 
efficiency. The product classification and the 
potential of users assisted in navigating website 
with respect to such association. 

A number of query patterns were changed into LTL 
formula to enable the extraction of significant 
correlations among sequences of events acquired 
from user behaviour. This helped in identifying how 
various website sections are visited and 
navigational patterns are associated to buying 
actions. Several problems, issues and 
enhancements with respect to product 
categorization and organization of website sections 
were resolved. LTL model was also capable of 
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executing in parallel with the aid of parallel servers. 
But, it was not a sufficient model to perform effective 
traffic pattern mining for web user tracking. 

3.4 Proposed Technique 

The web usage mining approach was 
implemented to predict the online navigational 
behavior of web users but it failed to perform the 
effective prediction of web traffic patterns at the 
required level. However, a novel method was 
implemented with the objective of providing better 
results in the web usage pattern detection by the 
implementation of client-side logging. It failed to 
minimize the time consumption for detecting the 
web usage patterns. Hence, the proposed Map 
Reduce Pearson Correlation Fisher's Linear 
Discriminant Classifier technique is introduced with 
the objective of effectively predicting the web traffic 
patterns from weblog database with improved 
accuracy and less time. In the proposed technique, 
the frequent or the non frequent web patterns on 
weblog database are effectively classified with 
higher accuracy by using Fisher’s Linear 
Discriminant Classifier. Thus, the performance of 
Pearson Correlation Analysis effectively predicted 
the web traffic patterns with minimized time 
consumption. 


Then, the proposed MPC-FLDC technique is 
carried out to analyze the web traffic pattern 
analysis within three phases such as 
preprocessing, Fisher’s Linear Discriminant 
Classifier and Pearson Correlation Analysis. After 
performing the web pattern classification, the 
proposed MPC-FLDC technique carried out 
Pearson Correlation Analysis in order to achieve 
effective web traffic pattern prediction (daily/hourly 
traffic) in weblog database. The web traffic patterns 
are represented as the web pages which are 
browsed more number of times by a web user. With 
the classified frequent web patterns, the web traffic 
patterns are predicted by using Pearson Correlation 
Analysis. Through the Pearson Correlation Analysis, 
the degree of web pages correlation is estimated 
among different sessions in order to perform web 
traffic predictions in a significant way. 


The establishment of Pearson Correlation 
computes the similarity of web pages between the 
user sessions. From the determined degree of 
correlation i.e. similarity, the daily and hourly traffic 


volume are predicted in an efficient manner. As a 
result, the prediction rate is enhanced while 
performing the web traffic pattern mining on weblog 
database. 


III. EXPERIMENTATION AND RESULTS 


An effective Clustering framework is 
implemented in Java language using Apache log 
samples dataset. The Apache log samples 
datasets identifies the access activities of several 
web users namely IP address, Date, Time of 
Access, Port Number and accessed Web page. 
The tables and the graphs generated depend on 
the performance values obtained from experiments 
to assure the effectiveness of the proposed 
technique. 


4.1 Performance Analysis of Space Complexity 

The space complexity is defined as the 
amount of memory space required to store the 
similar web pages from the web server log files. 
The space complexity is measured as the 
difference between the entire memory space and 
the unused memory space on weblog database. 
The mathematical expression of space 
complexity is given as 

SC = Total NeNory space - unused NeNory 
space .(4.1) 

In the above equation (4.1), the space 
complexity is represented as ‘SC’ which is 
measured in terms of Mega Bytes (MB). The 
lower value of space complexity enhanced the 
performance of DFC-DPG framework. 
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I. CONCLUSION 

The proposed framework performing the 
web user data analysis in an effective manner. The 
fuzzy clustering approach is carried out on the 
obtained relevant data regarding the user to form 
the clusters with similar user interest web pages. 
The DPG model is performed for changing the web 
user session into the graph which results in the 
reduction of latency. The experiment result shows 
that the proposed DFC-DPG framework groups the 
similar user interest web pages with the 
improvement of clustering efficiency with other 
existing methods. An effective PFF- WPC technique 
is implemented to track the web user location 
through the performance of web traffic pattern 
mining. By carrying the Poisson fragment process, 
the web pages are grouped at different sessions 
which result to attain effective web user tracking. 
Through the deployment of frequency-based web 
patterns clustering, frequent or non-frequent web 
patterns are clustered from web pages. For the 
detected frequent web patterns, temporal similarity 
is determined to find the web traffic patterns. In this 
proposed technique, the clustering efficiency has 
improved and computational complexity is reduced 
for web user when compared to existing methods. 


Table 4.1 Space Complexity 


Number 
of web 
patterns 

DFC-DPG 

framework 

LTL 

based 

model 

Fuzzy 

Clustering 

Proposed 

technique 

30 

16 

18 

23 

14 

60 

21 

22 

28 

19 

90 

22 

24 

29 

20 

120 

23 

26 

31 

21 

150 

28 

29 

36 

26 

180 

29 

32 

38 

27 

210 

30 

33 

40 

28 

240 

31 

34 

42 

29 

270 

32 

35 

43 

30 

300 

33 

36 

45 

31 


Fig. 4.1 Space Complexity 



According to the different number of web patterns, 
the experimental result of space complexity is 
determined as shown in table 4.1. While carrying 
out the experiment, the number of web patterns 
considered ranges from 30 to 300 which are taken 
as input. After the experiment, the proposed is 
compared with the existing methods for analyzing 
the results of the space complexity. From table 4.1, 
it is comparatively proposed framework needs less 
memory space to store the web pages than the 
other existing methods. In the above 
Fig.4.1 explains the performance analysis of space 
complexity with the number of web patterns for 
proposed framework and three existing methods. 
This proposed framework relatively consumed less 
memory space for storing web pages when 
compared with the existing methods. 
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